y = β₀ + β₁x
Where:
β₀ is the intercept.
β₁ is the slope.
y = β₀ + β₁x + β₂x²
Where:
β₂ is the coefficient of the squared term.
The Curve:
The x² term introduces a curve into the relationship.
If β₂ is positive, the curve opens upward (like a U).
If β₂ is negative, the curve opens downward (like an inverted U).
# Descriptive statistics
Cleaned_Accra_MMDAs_Data %>% skim(Population)
| Name | Piped data |
| Number of rows | 134 |
| Number of columns | 76 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Population | 1 | 0.99 | 169676.7 | 85308.29 | 53004 | 94831 | 149248 | 223619 | 425518 | ▇▇▅▂▁ |
Cleaned_Accra_MMDAs_Data %>% skim(IGF)
| Name | Piped data |
| Number of rows | 134 |
| Number of columns | 76 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| IGF | 5 | 0.96 | 3991084 | 3516693 | 23236 | 1394723 | 2977112 | 4969326 | 16317055 | ▇▅▁▁▁ |
# Histograms
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Population)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of Population", x = "Population", y = "Frequency") +
scale_x_continuous(labels = comma)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = IGF)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of IGF Revenue", x = "IGF Revenue", y = "Frequency") +
scale_x_continuous(labels = comma)
# Growth Rate (Percentage)
Cleaned_Accra_MMDAs_Data <- Cleaned_Accra_MMDAs_Data %>%
mutate(
Population_Growth_Rate = c(NA, diff(Population) / Population[-length(Population)] * 100),
IGF_Growth_Rate = c(NA, diff(IGF) / IGF[-length(IGF)] * 100)
)
# Plot of Trends
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year, y = Population)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
labs(
title = "Trends in Population Growth ",
x = "Year (2012-2022)",
y = "Population"
) +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year, y = IGF)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
labs(
title = "Trends in IGF Revenue (Ghana Cedis) Growth ",
x = "Year (2012-2022)",
y = "IGF Revenue (Ghana Cedis)"
) +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Population, y = IGF)) +
geom_point(color = "blue") +
labs( title = "Population vs. IGF Revenue",
x = "population", y = "IGF Revenue (Ghana Cedis)") +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
The histograms show the distribution of population and IGF revenue are skewed to the right. The scatter plot show a positive relationship between population and IGF revenue. As population increases IGF revenue tends to increase.
mod1 <- lm(IGF ~ Population, data = Cleaned_Accra_MMDAs_Data)
summary(mod1)
##
## Call:
## lm(formula = IGF ~ Population, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6043138 -1775692 -977216 819612 11256715
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1100581.153 652684.396 1.686 0.0942 .
## Population 17.647 3.592 4.912 0.00000274 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3245000 on 126 degrees of freedom
## (6 observations deleted due to missingness)
## Multiple R-squared: 0.1607, Adjusted R-squared: 0.1541
## F-statistic: 24.13 on 1 and 126 DF, p-value: 0.000002739
Cleaned_Accra_MMDAs_Data %>%
ggplot(aes(x = Population, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(x = "Population", y = "IGF Revenue (Ghana Cedis)", title = "Linear Relationship between Population and IGF Revenue") +
scale_y_continuous(labels = scales::comma)
# The Quadratic Term
Cleaned_Accra_MMDAs_Data$Population_Squared <- Cleaned_Accra_MMDAs_Data$Population^2
# Quadratic Regression
mod_quad <- lm(IGF ~ Population + Population_Squared, data = Cleaned_Accra_MMDAs_Data)
summary(mod_quad)
##
## Call:
## lm(formula = IGF ~ Population + Population_Squared, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6946145 -2128240 -639971 1104051 10445638
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4509175.15965575 1282425.99841933 3.516 0.000611 ***
## Population -24.91868849 14.36168841 -1.735 0.085191 .
## Population_Squared 0.00010718 0.00003509 3.055 0.002753 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3143000 on 125 degrees of freedom
## (6 observations deleted due to missingness)
## Multiple R-squared: 0.219, Adjusted R-squared: 0.2065
## F-statistic: 17.53 on 2 and 125 DF, p-value: 0.0000001948
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Population, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) + # Use formula for quadratic
labs(x = "Population", y = "IGF Revenue (Ghana Cedis)", title = "Quadratic Relationship between Population and IGF Revenue") +
scale_y_continuous(labels = comma)
Linear Regression:
Coefficients:
Intercept: 1100581.153
Population: 17.647 .
P-values: Intercept: 0.0942 (insignificant)
Population: 0.00000274 (significant)
R-squared: Multiple R-squared: 0.1607
Adjusted R-squared: 0.1541
Interpretation:
There is a statistically significant positive relationship between Population and IGF. As Population increases, IGF tends to increase. For each unit increase in population, IGF is predicted to increase by approximately 17.647 Ghana Cedis.
The linear model shows a statistically significant positive relationship between Population and IGF. But the Multiple R-squared = 0.1607 indicates Population explains only 16.07% of the variance in IGF. Adjusted R-squared = 0.1541 is low as well.
Quadratic Regression:
Coefficients: Intercept: 4509175.15965575
Population: -24.91868849
Population_Squared: 0.00010718
P-values: The coefficient for the population is the only statistically insignificant (p > 0.01) term, the others are significant. The overall model is statistically significant ( p-value = 0.0000001948).
R-squared: Multiple R-squared: 0.219
Adjusted R-squared: 0.2065
Interpretation: The quadratic model shows a statistically significant relationship between population and IGF revenue. A slight improvement of the R-squared (0.219).
# Residual
ggplot(data = data.frame(residuals = residuals(mod1), fitted = fitted(mod1)), aes(x = fitted, y = residuals)) +
geom_point() + # Added geom_point()
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs. Fitted (Linear) ", x = "Fitted Values", y = "Residuals")
ggplot(data = data.frame(residuals = residuals(mod1)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = "Histogram of Residuals(Linear)", x = "Residuals")
ggplot(data = data.frame(residuals = residuals(mod1)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = "Q-Q Plot of Residuals")
# Residuals vs. Fitted Values
ggplot(data = data.frame(residuals = residuals(mod_quad), fitted = fitted(mod_quad)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs. Fitted (Quadratic Model)", x = "Fitted Values", y = "Residuals")
# Histogram of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = "Histogram of Residuals (Quadratic Model)", x = "Residuals")
# Q-Q Plot of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = "Q-Q Plot of Residuals (Quadratic Model)")
shapiro.test(resid(mod1))
##
## Shapiro-Wilk normality test
##
## data: resid(mod1)
## W = 0.83278, p-value = 0.0000000000952
shapiro.test(resid(mod_quad))
##
## Shapiro-Wilk normality test
##
## data: resid(mod_quad)
## W = 0.87419, p-value = 0.000000005049
# Durbin-Watson Test (Autocorrelation)
dwtest(mod1)
##
## Durbin-Watson test
##
## data: mod1
## DW = 0.62936, p-value = 0.000000000000001749
## alternative hypothesis: true autocorrelation is greater than 0
dwtest(mod_quad)
##
## Durbin-Watson test
##
## data: mod_quad
## DW = 0.64655, p-value = 0.000000000000002704
## alternative hypothesis: true autocorrelation is greater than 0
# Breusch-Pagan Test (Homoscedasticity)
bptest(mod1)
##
## studentized Breusch-Pagan test
##
## data: mod1
## BP = 0.61495, df = 1, p-value = 0.4329
bptest(mod_quad)
##
## studentized Breusch-Pagan test
##
## data: mod_quad
## BP = 2.7143, df = 2, p-value = 0.2574
# Variance Inflation Factor (VIF) - Multicollinearity
bptest(mod1)
##
## studentized Breusch-Pagan test
##
## data: mod1
## BP = 0.61495, df = 1, p-value = 0.4329
vif(mod_quad)
## Population Population_Squared
## 17.03933 17.03933
Both the linear and quadratic models violate the autocorrelation and normality assumptions. And in the quadratic model Multicollinearity assumption is not satisfied.
# Transformed Model
log_log_mod <- lm(log(IGF) ~ log(Population), data = Cleaned_Accra_MMDAs_Data)
summary(log_log_mod)
##
## Call:
## lm(formula = log(IGF) ~ log(Population), data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9681 -0.4194 0.0490 0.4937 2.2734
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.8304 2.1396 1.790 0.0758 .
## log(Population) 0.9197 0.1799 5.114 0.00000114 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.989 on 126 degrees of freedom
## (6 observations deleted due to missingness)
## Multiple R-squared: 0.1719, Adjusted R-squared: 0.1653
## F-statistic: 26.15 on 1 and 126 DF, p-value: 0.000001144
# Scatter Plots (Transformed Data)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Ln_Pop, y = Ln_IGF)) +
geom_point() +
geom_smooth(method = "lm") +
labs(title = "Log(Population) vs. Log(IGF Revenue)", x = "Log(Population)", y = "Log(IGF Revenue)")
sqrt_model <- lm(sqrt(IGF) ~ Population, data = Cleaned_Accra_MMDAs_Data)
summary(sqrt_model)
##
## Call:
## lm(formula = sqrt(IGF) ~ Population, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1643.1 -475.0 -104.7 342.6 2275.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1088.5423351 151.8126481 7.170 0.0000000000561 ***
## Population 0.0044499 0.0008356 5.326 0.0000004469786 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 754.8 on 126 degrees of freedom
## (6 observations deleted due to missingness)
## Multiple R-squared: 0.1837, Adjusted R-squared: 0.1773
## F-statistic: 28.36 on 1 and 126 DF, p-value: 0.000000447
After the log log and square root transformations the log log model show an improvement of the relationship than the simple linear model and the relationship is still significant (p-value: 0.000001144 and R-squared: 0.1719 ). The log model provides the best fit among the models so far.
# Function to perform diagnostic tests and plots
perform_diagnostics <- function(model, model_name) {
# Residuals vs. Fitted
plot1 <- ggplot(data = data.frame(residuals = residuals(model), fitted = fitted(model)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = paste("Residuals vs. Fitted (", model_name, ")"), x = "Fitted Values", y = "Residuals")
# Histogram of Residuals
plot2 <- ggplot(data = data.frame(residuals = residuals(model)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = paste("Histogram of Residuals (", model_name, ")"), x = "Residuals")
# Q-Q Plot of Residuals
plot3 <- ggplot(data = data.frame(residuals = residuals(model)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = paste("Q-Q Plot of Residuals (", model_name, ")"))
# Durbin-Watson Test
dw_test <- dwtest(model)
print(paste("Durbin-Watson Test (", model_name, "):"))
print(dw_test)
# Breusch-Pagan Test
bp_test <- bptest(model)
print(paste("Breusch-Pagan Test (", model_name, "):"))
print(bp_test)
# Print VIF (if applicable)
if (length(coef(model)) > 2) { # Check for multiple predictors
vif_result <- vif(model)
print(paste("VIF (", model_name, "):"))
print(vif_result)
}
# Arrange plots
grid.arrange(plot1, plot2, plot3, nrow = 1)
}
# Perform diagnostics for each model
perform_diagnostics(mod1, "Linear Model")
## [1] "Durbin-Watson Test ( Linear Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 0.62936, p-value = 0.000000000000001749
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Linear Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 0.61495, df = 1, p-value = 0.4329
perform_diagnostics(log_log_mod, "Log-Log Model")
## [1] "Durbin-Watson Test ( Log-Log Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 0.92318, p-value = 0.0000000002708
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Log-Log Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 10.113, df = 1, p-value = 0.001473
perform_diagnostics(sqrt_model, "Square Root Model")
## [1] "Durbin-Watson Test ( Square Root Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 0.70331, p-value = 0.00000000000004657
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Square Root Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 1.1057, df = 1, p-value = 0.293
cor.test(Cleaned_Accra_MMDAs_Data$Population, Cleaned_Accra_MMDAs_Data$IGF)
##
## Pearson's product-moment correlation
##
## data: Cleaned_Accra_MMDAs_Data$Population and Cleaned_Accra_MMDAs_Data$IGF
## t = 4.9122, df = 126, p-value = 0.000002739
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2443784 0.5370741
## sample estimates:
## cor
## 0.4009076
Therefore from the analysis so far we found a strong and statistically significant positive linear relationship between population and IGF revenue. The population size correlated with IGF revenue performance but the relationship is not perfectly strong (Pearson’s product-moment correlation coefficient = 0.4009) . Some of the assumptions are not met even after the transformations.
Cleaned_Accra_MMDAs_Data %>% skim(Population)
| Name | Piped data |
| Number of rows | 134 |
| Number of columns | 79 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Population | 1 | 0.99 | 169676.7 | 85308.29 | 53004 | 94831 | 149248 | 223619 | 425518 | ▇▇▅▂▁ |
Cleaned_Accra_MMDAs_Data %>% skim(DACF)
| Name | Piped data |
| Number of rows | 134 |
| Number of columns | 79 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| DACF | 0 | 1 | 3167362 | 5537956 | 0 | 1584136 | 2410939 | 3468858 | 64171193 | ▇▁▁▁▁ |
# Histograms
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Population)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of Population", x = "Population")
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = DACF)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of DACF Revenue", x = "DACF Revenue")
# Plot of Trends
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year, y = Population)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
labs(
title = "Trends in Population Growth ",
x = "Year (2012-2022)",
y = "Population"
) +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year, y = DACF)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
labs(
title = "Trends in DACF Revenue (Ghana Cedis) Growth ",
x = "Year (2012-2022)",
y = "DACF Revenue (Ghana Cedis)"
) +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Population, y = DACF)) +
geom_point(color = "blue") +
labs( title = "Population vs. DACF Revenue",
x = "population", y = "DACF Revenue (Ghana Cedis)") +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
The histograms show an uneven distribution of population and DACF revenue. Both are right skewed. There is a potential outlier in the DACF. The scatter plot show a weak relationship and does not appear to be a linear relationship between population and DACF revenue.
mod2 <- lm(DACF ~ Population, data = Cleaned_Accra_MMDAs_Data)
summary(mod2)
##
## Call:
## lm(formula = DACF ~ Population, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3722082 -1560495 -635551 488389 60421667
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4154836.128 1075738.100 3.862 0.000176 ***
## Population -5.735 5.669 -1.012 0.313501
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5556000 on 131 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.007754, Adjusted R-squared: 0.0001798
## F-statistic: 1.024 on 1 and 131 DF, p-value: 0.3135
Cleaned_Accra_MMDAs_Data %>%
ggplot(aes(x = Population, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) + # Added confidence intervals
labs(x = "Population", y = "DACF Revenue (Ghana Cedis)", title = "Linear Relationship between Population and DACF Revenue") +
scale_y_continuous(labels = scales::comma)
# Quadratic Regression
mod_quad <- lm(DACF ~ Population + Population_Squared, data = Cleaned_Accra_MMDAs_Data)
summary(mod_quad)
##
## Call:
## lm(formula = DACF ~ Population + Population_Squared, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4222408 -1696001 -495436 576125 59843864
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6207131.11715897 2248941.43187925 2.760 0.00662 **
## Population -30.97832015 24.94618474 -1.242 0.21654
## Population_Squared 0.00006195 0.00005962 1.039 0.30071
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5554000 on 130 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.01593, Adjusted R-squared: 0.0007872
## F-statistic: 1.052 on 2 and 130 DF, p-value: 0.3522
From the regression results there is no statistically significant linear relationship between population and DACF revenue performance patterns (p-value: 0.3135, R-squared: 0.007754, and Adjusted R-squared: 0.0001798). The Population coefficient is negative means a slight negative trend but it’s not statistically significant (p-value: 0.3135) . The quadratic model too is not significant.
# Residual
ggplot(data = data.frame(residuals = residuals(mod2),
fitted = fitted(mod2)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs. Fitted",
x = "Fitted Values", y = "Residuals")
ggplot(data = data.frame(residuals = residuals(mod2)),
aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = "Histogram of Residuals", x = "Residuals")
ggplot(data = data.frame(residuals = residuals(mod2)),
aes(sample = residuals)) +
stat_qq() +
stat_qq_line() +
labs(title = "Q-Q Plot of Residuals ")
shapiro.test(resid(mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(mod2)
## W = 0.27024, p-value < 0.00000000000000022
# Autocorrelation
dwtest(mod2)
##
## Durbin-Watson test
##
## data: mod2
## DW = 2.0154, p-value = 0.511
## alternative hypothesis: true autocorrelation is greater than 0
# Homoscedasticity (Constant Variance of Residuals)
bptest(mod2)
##
## studentized Breusch-Pagan test
##
## data: mod2
## BP = 1.4442, df = 1, p-value = 0.2295
# Multicollinearity
#simple linear regression with one predictor(population), multicollinearity is not an issue.
# Multivariate Normality
#It is a simple linear regression with one predictor(population), multicollinearity therefore this is not an issue.
The test of the assumptions of linear regression show the residuals are not normally distributed all others are met.
#Transformed Models
Cleaned_Accra_MMDAs_Data$DACF_adjusted <- Cleaned_Accra_MMDAs_Data$DACF + 1
log_mod2 <- lm(log(DACF_adjusted) ~ log(Population), data = Cleaned_Accra_MMDAs_Data)
summary(log_mod2 )
#
# Call:
# lm(formula = log(DACF_adjusted) ~ log(Population), data = Cleaned_Accra_MMDAs_Data)
#
# Residuals:
# Min 1Q Median 3Q Max
# -14.3214 -0.2674 0.1536 0.4779 3.6785
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 10.4161 2.9779 3.498 0.000641 ***
# log(Population) 0.3477 0.2497 1.393 0.166050
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Residual standard error: 1.447 on 131 degrees of freedom
# (1 observation deleted due to missingness)
# Multiple R-squared: 0.01459, Adjusted R-squared: 0.007069
# F-statistic: 1.94 on 1 and 131 DF, p-value: 0.166
sqrt_mod2 <- lm( sqrt(DACF)~sqrt(Population), data = Cleaned_Accra_MMDAs_Data )
summary(sqrt_mod2)
#
# Call:
# lm(formula = sqrt(DACF) ~ sqrt(Population), data = Cleaned_Accra_MMDAs_Data)
#
# Residuals:
# Min 1Q Median 3Q Max
# -1653.2 -359.2 -73.4 234.7 6355.4
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 1716.7906 263.9496 6.504 0.00000000151 ***
# sqrt(Population) -0.2314 0.6408 -0.361 0.719
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Residual standard error: 742.3 on 131 degrees of freedom
# (1 observation deleted due to missingness)
# Multiple R-squared: 0.0009948, Adjusted R-squared: -0.006631
# F-statistic: 0.1304 on 1 and 131 DF, p-value: 0.7185
# Scatter Plots (Transformed Data)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = log(Population), y = log(DACF))) +
geom_point() +
geom_smooth(method = "lm")+
labs(title = "Log(Population) vs. Log(DACF Revenue)",
x = "Log(Population)", y = "Log(DACF Revenue)")
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = log(Population), y = log(DACF))) +
geom_point() +
geom_smooth(method = "lm")+
labs(title = "Sqrt(Population) vs. Sqrt(DACF Revenue)",
x = "Sqrt(Population)", y = "Sqrt(DACF Revenue)")
# Function to perform diagnostic tests and plots
perform_diagnostics <- function(model, model_name) {
# Residuals vs. Fitted
plot1 <- ggplot(data = data.frame(residuals = residuals(model), fitted = fitted(model)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = paste("Residuals vs. Fitted (", model_name, ")"), x = "Fitted Values", y = "Residuals")
# Histogram of Residuals
plot2 <- ggplot(data = data.frame(residuals = residuals(model)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = paste("Histogram of Residuals (", model_name, ")"), x = "Residuals")
# Q-Q Plot of Residuals
plot3 <- ggplot(data = data.frame(residuals = residuals(model)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = paste("Q-Q Plot of Residuals (", model_name, ")"))
# Durbin-Watson Test
dw_test <- dwtest(model)
print(paste("Durbin-Watson Test (", model_name, "):"))
print(dw_test)
# Breusch-Pagan Test
bp_test <- bptest(model)
print(paste("Breusch-Pagan Test (", model_name, "):"))
print(bp_test)
# Print VIF (if applicable)
if (length(coef(model)) > 2) { # Check for multiple predictors
vif_result <- vif(model)
print(paste("VIF (", model_name, "):"))
print(vif_result)
}
# Arrange plots
grid.arrange(plot1, plot2, plot3, nrow = 1)
}
# Perform diagnostics for each model
perform_diagnostics(mod2, "Linear Model")
## [1] "Durbin-Watson Test ( Linear Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 2.0154, p-value = 0.511
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Linear Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 1.4442, df = 1, p-value = 0.2295
perform_diagnostics(log_mod2, "Log-Log Model")
## [1] "Durbin-Watson Test ( Log-Log Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 1.9062, p-value = 0.2699
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Log-Log Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 2.4445, df = 1, p-value = 0.1179
perform_diagnostics(sqrt_mod2, "Square Root Model")
## [1] "Durbin-Watson Test ( Square Root Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 1.7832, p-value = 0.09263
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Square Root Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 2.5715, df = 1, p-value = 0.1088
shapiro.test(resid(mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(mod2)
## W = 0.27024, p-value < 0.00000000000000022
shapiro.test(resid(log_mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(log_mod2)
## W = 0.46366, p-value < 0.00000000000000022
shapiro.test(resid(sqrt_mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(sqrt_mod2)
## W = 0.67538, p-value = 0.0000000000000009358
Both the log-log and square root transformations are still statistically not significant. Though they have slightly improve the model and normality assumption is still a problem.
# Function to perform diagnostic tests and plots
perform_diagnostics <- function(model, model_name) {
# Residuals vs. Fitted
plot1 <- ggplot(data = data.frame(residuals = residuals(model), fitted = fitted(model)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = paste("Residuals vs. Fitted (", model_name, ")"), x = "Fitted Values", y = "Residuals")
# Histogram of Residuals
plot2 <- ggplot(data = data.frame(residuals = residuals(model)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = paste("Histogram of Residuals (", model_name, ")"), x = "Residuals")
# Q-Q Plot of Residuals
plot3 <- ggplot(data = data.frame(residuals = residuals(model)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = paste("Q-Q Plot of Residuals (", model_name, ")"))
# Durbin-Watson Test
dw_test <- dwtest(model)
print(paste("Durbin-Watson Test (", model_name, "):"))
print(dw_test)
# Breusch-Pagan Test
bp_test <- bptest(model)
print(paste("Breusch-Pagan Test (", model_name, "):"))
print(bp_test)
# Print VIF (if applicable)
if (length(coef(model)) > 2) { # Check for multiple predictors
vif_result <- vif(model)
print(paste("VIF (", model_name, "):"))
print(vif_result)
}
# Arrange plots
grid.arrange(plot1, plot2, plot3, nrow = 1)
}
# Perform diagnostics for each model
perform_diagnostics(mod2, "Linear Model")
## [1] "Durbin-Watson Test ( Linear Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 2.0154, p-value = 0.511
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Linear Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 1.4442, df = 1, p-value = 0.2295
perform_diagnostics(log_mod2, "Log-Log Model")
## [1] "Durbin-Watson Test ( Log-Log Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 1.9062, p-value = 0.2699
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Log-Log Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 2.4445, df = 1, p-value = 0.1179
perform_diagnostics(sqrt_mod2, "Square Root Model")
## [1] "Durbin-Watson Test ( Square Root Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 1.7832, p-value = 0.09263
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Square Root Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 2.5715, df = 1, p-value = 0.1088
shapiro.test(resid(mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(mod2)
## W = 0.27024, p-value < 0.00000000000000022
shapiro.test(resid(log_mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(log_mod2)
## W = 0.46366, p-value < 0.00000000000000022
shapiro.test(resid(sqrt_mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(sqrt_mod2)
## W = 0.67538, p-value = 0.0000000000000009358
From the regression analysis so all the models are statistically not significant and the normality assumption is not met. Given these models it cannot be concluded that changes in the population reliably predict changes in the DACF revenue performance and any observed pattern could likely be due to chance.
The recurrent expenditure is NA
Cleaned_Accra_MMDAs_Data %>% skim(Capital_Expenditure)
| Name | Piped data |
| Number of rows | 134 |
| Number of columns | 80 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Capital_Expenditure | 6 | 0.96 | 3003494 | 2232785 | 0 | 1485120 | 2420991 | 4347636 | 14576636 | ▇▅▁▁▁ |
# Capital Expenditure Histogram
cap_hist <- ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Capital_Expenditure)) +
geom_histogram(aes(y = ..density..), bins = 10, fill = "skyblue", color = "black") +
geom_density(color = "red") +
labs(title = "Distribution of Capital Expenditure", x = "Capital Expenditure (Ghana Cedis)", y = "Density") +
scale_x_continuous(labels = comma)
# Population Histogram
pop_hist <- ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Population)) +
geom_histogram(aes(y = ..density..), bins = 10, fill = "dodgerblue", color = "black") +
geom_density(color = "red") +
labs(title = "Distribution of Population", x = "Population", y = "Density") +
scale_x_continuous(labels = comma)
cap_hist
pop_hist
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year, y = Population)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
labs(
title = "Population Trend",
x = "Year (2012-2022)",
y = "Population"
) +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year, y = Capital_Expenditure)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
labs(
title = "Capital Expenditure Trend",
x = "Year (2012-2022)",
y = "Capital Expenditure"
) +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Population, y = Capital_Expenditure)) +
geom_point(color = "blue") +
labs( title = "Population vs. Capital Expenditure",
x = "population", y = "Capital Expenditure (Ghana Cedis)") +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
# Calculate Per Capita Values
Cleaned_Accra_MMDAs_Data$Capital_Exp_Per_Capita <- Cleaned_Accra_MMDAs_Data$Capital_Expenditure / Cleaned_Accra_MMDAs_Data$Population
# Per Capita Analysis
average_capita <- mean(Cleaned_Accra_MMDAs_Data$Capital_Exp_Per_Capita)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year)) +
geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita"), color = "blue") +
labs(title = "Capital Expenditure Per Capita Over Time", x = "Year (2012 - 2022) ", y = "Ghana Cedis Per Capita", color = "Type") +
scale_y_continuous(labels = comma)
mod3 <- lm(Capital_Expenditure ~ Population, data = Cleaned_Accra_MMDAs_Data)
summary(mod3)
##
## Call:
## lm(formula = Capital_Expenditure ~ Population, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3048586 -1638009 -649028 1116008 11122904
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2039122.193 444409.755 4.588 0.0000107 ***
## Population 5.602 2.300 2.436 0.0163 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2197000 on 125 degrees of freedom
## (7 observations deleted due to missingness)
## Multiple R-squared: 0.04532, Adjusted R-squared: 0.03768
## F-statistic: 5.934 on 1 and 125 DF, p-value: 0.01626
mod_cap <- lm(Capital_Expenditure ~ Population, data = Cleaned_Accra_MMDAs_Data)
summary(mod_cap)
##
## Call:
## lm(formula = Capital_Expenditure ~ Population, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3048586 -1638009 -649028 1116008 11122904
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2039122.193 444409.755 4.588 0.0000107 ***
## Population 5.602 2.300 2.436 0.0163 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2197000 on 125 degrees of freedom
## (7 observations deleted due to missingness)
## Multiple R-squared: 0.04532, Adjusted R-squared: 0.03768
## F-statistic: 5.934 on 1 and 125 DF, p-value: 0.01626
Cleaned_Accra_MMDAs_Data %>%
ggplot(aes(x = Population, y = Capital_Expenditure)) +
geom_point()+
geom_smooth(method = "lm", se = TRUE) + labs(x = "Population", y = "Capital Expenditure", title = "Linear Relationship Population and Capital Expenditure")+
scale_y_continuous(labels = scales::comma)
From the linear regression results there is a significant positive linear relationship between Population and Capital Expenditure(p-value: 0.01626, R-squared: 0.0453). In the model population explains only as low as 4.53% of the variation in capital expenditure. Only a small portion of the variation in capital expenditure is explained by population.
# Diagnostic Function
perform_diagnostics <- function(model, model_name) {
# Residuals vs. Fitted
plot1 <- ggplot(data = data.frame(residuals = residuals(model), fitted = fitted(model)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = paste("Residuals vs. Fitted (", model_name, ")"), x = "Fitted Values", y = "Residuals")
# Histogram of Residuals
plot2 <- ggplot(data = data.frame(residuals = residuals(model)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = paste("Histogram of Residuals (", model_name, ")"), x = "Residuals")
# Q-Q Plot of Residuals
plot3 <- ggplot(data = data.frame(residuals = residuals(model)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = paste("Q-Q Plot of Residuals (", model_name, ")"))
# Durbin-Watson Test
dw_test <- dwtest(model)
print(paste("Durbin-Watson Test (", model_name, "):"))
print(dw_test)
# Breusch-Pagan Test
bp_test <- bptest(model)
print(paste("Breusch-Pagan Test (", model_name, "):"))
print(bp_test)
# Print VIF (if applicable)
if (length(coef(model)) > 2) { # Check for multiple predictors
vif_result <- vif(model)
print(paste("VIF (", model_name, "):"))
print(vif_result)
}
# Arrange plots
grid.arrange(plot1, plot2, plot3, nrow = 1)
}
# Perform Diagnostics
# Capital Expenditure
perform_diagnostics(mod3, "Capital Expenditure Model")
## [1] "Durbin-Watson Test ( Capital Expenditure Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 1.0642, p-value = 0.00000003825
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Capital Expenditure Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 0.057858, df = 1, p-value = 0.8099
shapiro.test(resid(mod3))
##
## Shapiro-Wilk normality test
##
## data: resid(mod3)
## W = 0.89149, p-value = 0.00000003736
# Recurrent Expenditure
From the linear model violates the autocorrelation and normality assumptions of linear regression
Cleaned_Accra_MMDAs_Data$Recrrent_Expenditure_squared <- Cleaned_Accra_MMDAs_Data$Recrrent_Expenditure^2
Cleaned_Accra_MMDAs_Data$Capital_Expenditure_squared <- Cleaned_Accra_MMDAs_Data$Capital_Expenditure^2
mod_quad <- lm(cbind(Capital_Expenditure) ~ Population + Population_Squared, data = Cleaned_Accra_MMDAs_Data)
# View the summary
summary(mod_quad)
##
## Call:
## lm(formula = cbind(Capital_Expenditure) ~ Population + Population_Squared,
## data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2976473 -1658166 -611487 1134668 11190838
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2421761.56908601 944502.61295085 2.564 0.0115 *
## Population 0.99419964 10.28986386 0.097 0.9232
## Population_Squared 0.00001118 0.00002434 0.460 0.6467
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2204000 on 124 degrees of freedom
## (7 observations deleted due to missingness)
## Multiple R-squared: 0.04694, Adjusted R-squared: 0.03157
## F-statistic: 3.054 on 2 and 124 DF, p-value: 0.05074
# Scatter Plots (Transformed Data)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Population, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) +
labs(x = "Population", y = "Capital Expenditure (Ghana Cedis)", title = "Quadratic Relationship between Population and Capital Expenditure") +
scale_y_continuous(labels = comma)
Quadratic model show no improvement of the relationship between population and capital expenditure. The overall p-value is still signifacant but the individual terms are not.
# Log Transformation for Recurrent Expenditure
Cleaned_Accra_MMDAs_Data$Capital_Expenditure_adjusted <- Cleaned_Accra_MMDAs_Data$Capital_Expenditure + 1
log_cap_mod <- lm(log(Capital_Expenditure_adjusted) ~ Population, data = Cleaned_Accra_MMDAs_Data)
summary(log_cap_mod)
##
## Call:
## lm(formula = log(Capital_Expenditure_adjusted) ~ Population,
## data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.3169 -0.3649 0.4767 1.1044 2.3706
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.654142532 0.532370980 23.769 < 0.0000000000000002 ***
## Population 0.000008784 0.000002755 3.188 0.00181 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.632 on 125 degrees of freedom
## (7 observations deleted due to missingness)
## Multiple R-squared: 0.07521, Adjusted R-squared: 0.06781
## F-statistic: 10.17 on 1 and 125 DF, p-value: 0.001808
perform_diagnostics(log_cap_mod, "Log capital Expenditure Model")
## [1] "Durbin-Watson Test ( Log capital Expenditure Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 0.96763, p-value = 0.000000001577
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Log capital Expenditure Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 7.0361, df = 1, p-value = 0.007988
Cleaned_Accra_MMDAs_Data$Ln_Population <- log(Cleaned_Accra_MMDAs_Data$Population)
Cleaned_Accra_MMDAs_Data$Ln_Capital_Expenditure <- log(Cleaned_Accra_MMDAs_Data$Capital_Expenditure)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = log(Population), y = log(Capital_Expenditure))) +
geom_point() +
geom_smooth(method = "lm", se = TRUE)+
labs(title = "Log(Population) vs. Log(Capital Expenditure)",
x = "Log(Population)", y = "Log(Capital Expenditure)")
# Square root transformation for Capital Expenditure
sqrt_cap_mod <- lm(sqrt(Capital_Expenditure) ~ Population, data = Cleaned_Accra_MMDAs_Data)
summary(sqrt_cap_mod)
##
## Call:
## lm(formula = sqrt(Capital_Expenditure) ~ Population, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1391.79 -444.70 -45.67 440.98 2052.16
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1232.4143446 132.5196479 9.30 0.000000000000000587 ***
## Population 0.0021123 0.0006858 3.08 0.00255 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 655.1 on 125 degrees of freedom
## (7 observations deleted due to missingness)
## Multiple R-squared: 0.07054, Adjusted R-squared: 0.06311
## F-statistic: 9.487 on 1 and 125 DF, p-value: 0.002545
perform_diagnostics(sqrt_cap_mod, "Square root Capital Expenditure Model")
## [1] "Durbin-Watson Test ( Square root Capital Expenditure Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 1.0006, p-value = 0.00000000484
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Square root Capital Expenditure Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 4.9973, df = 1, p-value = 0.02539
From the transformations the recurrent expenditure model are still significant and met the assumptions but the capital expenditure have not.
From the regression analysis above the relationship between
population and capital expenditure is positive linear and significant
but weak. It has the Pearson’s product-moment correlation value =
0.2128856
Using total revenue growth rate and infrastructure delivery (capital expenditure per capita).
# Descriptive statistics
Cleaned_Accra_MMDAs_Data %>% skim(Capital_Exp_Per_Capita)
| Name | Piped data |
| Number of rows | 134 |
| Number of columns | 85 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Capital_Exp_Per_Capita | 7 | 0.95 | 20.6 | 19.16 | 0 | 7.32 | 15.73 | 24.1 | 85.11 | ▇▅▁▁▁ |
Cleaned_Accra_MMDAs_Data %>% skim(TtRev_Growth_Rate)
| Name | Piped data |
| Number of rows | 134 |
| Number of columns | 85 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| TtRev_Growth_Rate | 17 | 0.87 | 2.44 | 163.39 | -1726.55 | 2.57 | 14.95 | 27.69 | 89.91 | ▁▁▁▁▇ |
# Histograms
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Capital_Exp_Per_Capita)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of Capital expenditure per capita", x = "Capital expenditure per capita") +
scale_x_continuous(labels = comma)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = TtRev_Growth_Rate)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of Total Revenue Growth Rate", x = "Total revenue growth rate")
The histograms show an uneven distribution .
mod5 <- lm(Capital_Exp_Per_Capita ~ TtRev_Growth_Rate, data = Cleaned_Accra_MMDAs_Data)
summary(mod5)
##
## Call:
## lm(formula = Capital_Exp_Per_Capita ~ TtRev_Growth_Rate, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.244 -12.992 -5.474 3.061 63.001
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.104404 1.846949 11.968 <0.0000000000000002 ***
## TtRev_Growth_Rate -0.003893 0.011173 -0.348 0.728
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.63 on 111 degrees of freedom
## (21 observations deleted due to missingness)
## Multiple R-squared: 0.001092, Adjusted R-squared: -0.007907
## F-statistic: 0.1214 on 1 and 111 DF, p-value: 0.7282
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = TtRev_Growth_Rate, y = Capital_Exp_Per_Capita)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE)+
labs(title = "Revenue Growth vs. Capital Expenditure (Per Capita)",
x = "Total Revenue Growth Rate (%)",
y = "Capital Expenditure Per Capita")
The regression result show there no statistically significant relationship between total revenue growth rate and infrastructure delivery (capital expenditure per capita) with p-value (0.7282) is greater than 0.05 significance level. This means that changes in revenue growth do not significantly predict changes in capital expenditure per capita in this model. The R-squared (0.001092) indicates only 0.11% of the variation in capital expenditure per capita can be explained by revenue growth (total revenue growth rate)
Cleaned_Accra_MMDAs_Data$Expenditure_Growth <- c(NA, diff(Cleaned_Accra_MMDAs_Data$Total_Expenditure) / Cleaned_Accra_MMDAs_Data$Total_Expenditure[-nrow(Cleaned_Accra_MMDAs_Data)]) * 100
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Expenditure_Growth, y = Capital_Exp_Per_Capita)) +
geom_point() + geom_smooth(method = "lm", se = TRUE)+
labs(title = "Relationship Expenditure Growth vs. Capital Expenditure (Per Capita)",
x = "Expenditure Growth Rate (%)",
y = "Capital Expenditure Per Capita")
There is no statistically significant linear relationship.
# no variables
# Trends of Revenue and Expenditure over the years.
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year, y = Total_Revenue)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
labs(title = "Total Revenue Trend",
x = "Year (2012 - 2012)",
y = "Amount (Ghana Cedis)") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year, y = Total_Revenue)) +
geom_bar(stat = "identity", fill = "dodgerblue") +
labs(title = "Total Revenue Trend",
x = "Year",
y = "Amount (Ghana Cedis)") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year, y = Total_Expenditure)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
labs(
title = "Trends in Total Expenditure Growth ",
x = "Year (2012-2022)",
y = "Amount (Ghana Cedis)"
) +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year, y = Total_Expenditure)) +
geom_bar(stat = "identity", fill = "dodgerblue") +
labs(title = "Total Expenditure Trend",
x = "Year",
y = "Amount (Ghana Cedis)") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year)) +
geom_point(aes(y = Total_Revenue, color = "Total Revenue")) +
geom_point(aes(y = Total_Expenditure, color = "Total Expenditure")) +
labs(title = "Revenue Vs. Expenditure Trends Over Years",
x = "Year",
y = "Amount (Ghana Cedis)", color = "Type") +
scale_color_manual(values = c("Total Revenue" = "blue", "Total Expenditure" = "red")) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Total_Revenue, y = Total_Expenditure)) +
geom_point(color = "blue") +
labs( title = "Total Revenue Vs. Total Expenditure (Ghana Cedis)",
x = "Total Revenue", y = "Total Expenditure ") +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma) +
scale_x_continuous(labels = comma)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year)) +
geom_point(aes(y = IGF, color = "IGF"), linewidth = 1) +
geom_point(aes(y = DACF, color = "DACF"), linewidth = 1) +
geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure"), linewidth = 1) +
geom_point(aes(y = Others_Sources, color = "Other Sources"), linewidth = 1) +
labs(
title = "Revenue Trends",
x = "Year",
y = "Amount (Ghana Cedis)",
color = "Type"
) +
scale_color_manual(
values = c(
"Total Revenue" = "#0000FF", # Blue
"Other Sources" = "#87CEEB", # Light Blue
"IGF" = "#00CD66", # Green
"DACF" = "#808080", # Gray
"Capital Expenditure" = "#9370DB", # Purple
"Total Expenditure" = "#FF0000", # Red
"Recurrent Expenditure" = "#FFD700" # Yellow
)
) +
scale_y_continuous(labels = comma, breaks = seq(0, 60000000, 10000000)) + # Added breaks
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
# IGF to Total Expenditure Ratio
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year, y = IGF_TE)) +
geom_point(size = 2.5) +
labs(
title = "IGF to Total Expenditure Ratio Over Years",
x = "Year",
y = "Ratio (IGF/Total Expenditure)"
) +
scale_y_continuous(labels = percent_format(accuracy = 1))
cor.test(Cleaned_Accra_MMDAs_Data$Total_Expenditure, Cleaned_Accra_MMDAs_Data$Total_Revenue)
##
## Pearson's product-moment correlation
##
## data: Cleaned_Accra_MMDAs_Data$Total_Expenditure and Cleaned_Accra_MMDAs_Data$Total_Revenue
## t = 42.566, df = 127, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9531005 0.9763956
## sample estimates:
## cor
## 0.9666943
# Revenue Per Capita
Cleaned_Accra_MMDAs_Data$Total_Revenue_Per_Capita <- Cleaned_Accra_MMDAs_Data$Total_Revenue / Cleaned_Accra_MMDAs_Data$Population
Cleaned_Accra_MMDAs_Data$IGF_Per_Capita <- Cleaned_Accra_MMDAs_Data$IGF / Cleaned_Accra_MMDAs_Data$Population
Cleaned_Accra_MMDAs_Data$DACF_Per_Capita <- Cleaned_Accra_MMDAs_Data$DACF / Cleaned_Accra_MMDAs_Data$Population
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year)) +
geom_point(aes(y = IGF, color = "IGF"), linewidth = 1) +
geom_point(aes(y = DACF, color = "DACF"), linewidth = 1) +
geom_point(aes(y = Others_Sources, color = "Other Sources"), linewidth = 1) +
labs(
title = "Revenue Trends",
x = "Year",
y = "Amount (Ghana Cedis)",
color = "Type"
) +
scale_color_manual(
values = c(
"Total Revenue" = "#0000FF", # Blue
"Other Sources" = "#87CEEB", # Light Blue
"IGF" = "#00CD66", # Green
"DACF" = "#808080", # Gray
"Capital Expenditure" = "#9370DB", # Purple
"Total Expenditure" = "#FF0000", # Red
"Recurrent Expenditure" = "#FFD700" # Yellow
)
) +
scale_y_continuous(labels = comma, breaks = seq(0, 60000000, 10000000)) + # Added breaks
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
# Population Trend
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year, y = Total_Expenditure)) +
geom_bar(stat = "identity", fill = "dodgerblue") +
geom_point()+
labs(title = "Total Expenditure Trend",
x = "Year",
y = "Amount (Ghana Cedis)") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year, y = Population)) +
geom_bar(stat = "identity", fill = "dodgerblue") +
geom_point()+
labs(title = "Population Trend",
x = "Year",
y = "Population")
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year, y = IGF)) +
geom_bar(stat = "identity", fill = "dodgerblue") +
geom_point()+
labs(title = "IGF Trend",
x = "Year",
y = "IGF") +
scale_y_continuous(labels = comma)
# Per capita plot
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year)) +
geom_line(aes(y = Total_Revenue_Per_Capita, color = "Total Revenue Per Capita")) +
geom_point(aes(y = Total_Revenue_Per_Capita, color = "Total Revenue Per Capita")) +
geom_line(aes(y = IGF_Per_Capita, color = "IGF Per Capita")) +
geom_point(aes(y = IGF_Per_Capita, color = "IGF Per Capita")) +
geom_line(aes(y = DACF_Per_Capita, color = "DACF Per Capita")) +
geom_point(aes(y = DACF_Per_Capita, color = "DACF Per Capita")) +
labs(title = "Revenue Per Capita trends", x = "Year", y = "Amount (Ghana Cedis)", color = "Type") +
scale_y_continuous(labels = comma)
cor_matrix <- cor(Cleaned_Accra_MMDAs_Data[, c("Population", "Total_Revenue", "Total_Expenditure", "IGF_TE", "IGF")], use = "complete.obs")
print(cor_matrix)
## Population Total_Revenue Total_Expenditure IGF_TE
## Population 1.0000000 0.40021799 0.421275523 0.098553597
## Total_Revenue 0.4002180 1.00000000 0.970273216 0.083922559
## Total_Expenditure 0.4212755 0.97027322 1.000000000 0.001271894
## IGF_TE 0.0985536 0.08392256 0.001271894 1.000000000
## IGF 0.3859802 0.88365092 0.852844940 0.381545990
## IGF
## Population 0.3859802
## Total_Revenue 0.8836509
## Total_Expenditure 0.8528449
## IGF_TE 0.3815460
## IGF 1.0000000
corrplot(cor_matrix, main = "Correlation matrix of population and expenditure patterns")
In the above there is a strong positive correlation between total revenue and total expenditure and also between IGF.
# Total Revenue vs Population
model_revenue_pop <- lm(Total_Revenue ~ Population, data = Cleaned_Accra_MMDAs_Data)
summary(model_revenue_pop)
##
## Call:
## lm(formula = Total_Revenue ~ Population, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10217220 -3555800 -479229 2403423 16952672
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6145864.309 1001564.430 6.136 0.00000000934 ***
## Population 25.536 5.278 4.838 0.00000361706 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5173000 on 131 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.1516, Adjusted R-squared: 0.1451
## F-statistic: 23.41 on 1 and 131 DF, p-value: 0.000003617
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Population, y = Total_Revenue)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Total Revenue vs Population", x = "Population", y = "Total Revenue") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
# # Total Expenditure vs Population
model_expenditure_pop <- lm(Total_Expenditure ~ Population, data = Cleaned_Accra_MMDAs_Data)
summary(model_expenditure_pop)
##
## Call:
## lm(formula = Total_Expenditure ~ Population, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10237545 -3697052 1677 2911901 13515110
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5557076.955 1057168.243 5.257 0.000000608 ***
## Population 26.284 5.489 4.789 0.000004627 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5268000 on 126 degrees of freedom
## (6 observations deleted due to missingness)
## Multiple R-squared: 0.154, Adjusted R-squared: 0.1473
## F-statistic: 22.93 on 1 and 126 DF, p-value: 0.000004627
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Population, y = Total_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Total Expenditure vs Population", x = "Population", y = "Total Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
# Capital Expenditure vs Total Revenue and IGF_TE
model_capital_rev_igf <- lm(Capital_Expenditure ~ Total_Revenue + IGF_TE, data = Cleaned_Accra_MMDAs_Data)
summary(model_capital_rev_igf)
##
## Call:
## lm(formula = Capital_Expenditure ~ Total_Revenue + IGF_TE, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4101349 -954608 -37233 693716 9008776
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 637359.33446 406877.83501 1.566 0.120
## Total_Revenue 0.25936 0.02704 9.592 <0.0000000000000002 ***
## IGF_TE -1063875.71009 651330.29426 -1.633 0.105
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1684000 on 119 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.4385, Adjusted R-squared: 0.429
## F-statistic: 46.46 on 2 and 119 DF, p-value: 0.000000000000001225
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Total_Revenue, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital Expenditure vs Total Revenue", x = "Total Revenue", y = "Capital Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Population, y = IGF_TE)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF_TE vs Population", x = "Population", y = "IGF_TE") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = percent_format(accuracy = 1))
# IGF_TE vs Population and Total Revenue
model_igfte_pop_rev <- lm(IGF_TE ~ Population + Total_Revenue, data = Cleaned_Accra_MMDAs_Data)
summary(model_igfte_pop_rev)
##
## Call:
## lm(formula = IGF_TE ~ Population + Total_Revenue, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.34769 -0.13661 -0.02946 0.11525 1.18077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.341040867084 0.054460323270 6.262 0.00000000625 ***
## Population 0.000000224692 0.000000288791 0.778 0.438
## Total_Revenue 0.000000002155 0.000000004046 0.533 0.595
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2328 on 119 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.01207, Adjusted R-squared: -0.004535
## F-statistic: 0.7269 on 2 and 119 DF, p-value: 0.4856
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Total_Revenue, y = IGF_TE)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF_TE vs Total Revenue", x = "Total Revenue", y = "IGF_TE") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = percent_format(accuracy = 1))
In the regression results above, we found a significant linear relationship between between Total Revenue and Population, Total Expenditure and Population, and Capital Expenditure, Total Revenue. But there is non-significance between IGF_TE vs Population and Total Revenue.
# no variables
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = factor(Year), y = IGF)) +
geom_point(color = "dodgerblue")+
labs(title = "IGF Trend",
x = "Year",
y = "IGF (Ghana Cedis) ") +
scale_y_continuous(labels = comma)
# Land-Based Revenue Trends
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = (Year))) +
geom_point(aes(y = Act_Permit, color = "Permit Fees")) +
geom_point(aes(y = Act_Property_Rates, color = "Property Rates")) +
geom_point(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue")) +
geom_point(aes(y = Act_Licenses, color = "Licenses")) +
geom_point(aes(y = Act_Fees, color = "Act Fees")) +
labs(
title = "Land-Based Revenue Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
scale_color_brewer(palette = "Set1")+
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
The above shows the trends relationships.
# IGF vs Land-Based Revenues
model_igf_land <- lm(IGF ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_Accra_MMDAs_Data)
summary(model_igf_land)
##
## Call:
## lm(formula = IGF ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands +
## Act_Licenses + Act_Fees, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -454947 -192315 -69181 124763 833595
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 241248.22053 84370.11485 2.859 0.006135 **
## Act_Permit 1.10695 0.04943 22.396 < 0.0000000000000002 ***
## Act_Property_Rates 0.90705 0.04616 19.651 < 0.0000000000000002 ***
## Act_Stool_Lands 1.21033 0.13686 8.844 0.0000000000071549 ***
## Act_Licenses 1.13073 0.10512 10.756 0.0000000000000101 ***
## Act_Fees 0.72379 0.19023 3.805 0.000381 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 306300 on 51 degrees of freedom
## (77 observations deleted due to missingness)
## Multiple R-squared: 0.9924, Adjusted R-squared: 0.9916
## F-statistic: 1324 on 5 and 51 DF, p-value: < 0.00000000000000022
cor_matrix_land_igf <- cor(Cleaned_Accra_MMDAs_Data[, c("IGF", "Act_Permit", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_igf)
## IGF Act_Permit Act_Property_Rates Act_Stool_Lands
## IGF 1.0000000 0.8207077 0.91746328 0.11180082
## Act_Permit 0.8207077 1.0000000 0.65497158 -0.17173461
## Act_Property_Rates 0.9174633 0.6549716 1.00000000 -0.04099322
## Act_Stool_Lands 0.1118008 -0.1717346 -0.04099322 1.00000000
## Act_Licenses 0.8928748 0.6152421 0.80707668 0.14244048
## Act_Fees 0.5400054 0.2593373 0.43209716 0.45235115
## Act_Licenses Act_Fees
## IGF 0.8928748 0.5400054
## Act_Permit 0.6152421 0.2593373
## Act_Property_Rates 0.8070767 0.4320972
## Act_Stool_Lands 0.1424405 0.4523512
## Act_Licenses 1.0000000 0.5112223
## Act_Fees 0.5112223 1.0000000
corrplot(cor_matrix_land_igf)
From the multiple regression results of all the land-based revenues (permit fees, property rates, rents, stool lands revenue, Act fees, licenses) and revenue (IGF) the overall model is statistically significant with a high R-squared of 0.9924, means 99.24% of the variation in the IGF is explained by the land-based revenues (permit fees, property rates, rents, stool lands revenue, fees, licenses). All individual terms in the model that are also significant.
The correlation matrix shows that IGF is strongly correlated with Act property Rates, Permit, and licenses.
# Simple linear Regression Analysis
model_permit <- lm(IGF ~ Act_Permit, data = Cleaned_Accra_MMDAs_Data)
model_property <- lm(IGF ~ Act_Property_Rates, data = Cleaned_Accra_MMDAs_Data)
model_stool <- lm(IGF ~ Act_Stool_Lands, data = Cleaned_Accra_MMDAs_Data)
model_license <- lm(IGF ~ Act_Licenses, data = Cleaned_Accra_MMDAs_Data)
model_acts <- lm(IGF ~ Act_Fees, data = Cleaned_Accra_MMDAs_Data)
# Visualizations
# Scatter plots (IGF vs each land-based revenue)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Act_Permit, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Permit Fees", x = "Permit Fees", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_permit)
##
## Call:
## lm(formula = IGF ~ Act_Permit, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3605925 -991741 -549816 282807 7292429
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1223302.6721 226377.8701 5.404 0.000000321 ***
## Act_Permit 1.9103 0.1048 18.226 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1849000 on 124 degrees of freedom
## (8 observations deleted due to missingness)
## Multiple R-squared: 0.7282, Adjusted R-squared: 0.726
## F-statistic: 332.2 on 1 and 124 DF, p-value: < 0.00000000000000022
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Act_Property_Rates, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Property Rates", x = "Property Rates", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_property)
##
## Call:
## lm(formula = IGF ~ Act_Property_Rates, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4523479 -1235927 -367799 821667 5453414
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1940604.3136 186019.8343 10.43 <0.0000000000000002 ***
## Act_Property_Rates 2.2765 0.1155 19.72 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1752000 on 127 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.7538, Adjusted R-squared: 0.7519
## F-statistic: 388.8 on 1 and 127 DF, p-value: < 0.00000000000000022
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Act_Stool_Lands, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_stool)
##
## Call:
## lm(formula = IGF ~ Act_Stool_Lands, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3042025 -2091310 -806335 392962 12813674
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3503381.0113 454331.2369 7.711 0.000000000208 ***
## Act_Stool_Lands 0.7618 1.1305 0.674 0.503
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3317000 on 57 degrees of freedom
## (75 observations deleted due to missingness)
## Multiple R-squared: 0.007903, Adjusted R-squared: -0.009503
## F-statistic: 0.454 on 1 and 57 DF, p-value: 0.5031
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Act_Licenses, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Licenses", x = "Licenses", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_license)
##
## Call:
## lm(formula = IGF ~ Act_Licenses, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7334380 -942477 -223917 798847 6479473
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 913380.1262 287649.5098 3.175 0.00188 **
## Act_Licenses 2.9719 0.2069 14.362 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2179000 on 127 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.6189, Adjusted R-squared: 0.6159
## F-statistic: 206.3 on 1 and 127 DF, p-value: < 0.00000000000000022
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Act_Fees, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Act Fees", x = "Act Fees", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_acts)
##
## Call:
## lm(formula = IGF ~ Act_Fees, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8432445 -2115747 -671995 823935 10780982
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2517214.6822 394497.0582 6.381 0.00000000301 ***
## Act_Fees 3.5769 0.6718 5.324 0.00000044525 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3192000 on 127 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.1825, Adjusted R-squared: 0.176
## F-statistic: 28.35 on 1 and 127 DF, p-value: 0.0000004452
The simple linear regression analysis of the land-based revenues found all the simple models to be statistically significant except stool land with p-value = 0.5031
# Land-Based Revenue Trends
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year)) +
geom_point(aes(y = Act_Permit, color = "Permit Fees")) +
geom_point(aes(y = Act_Property_Rates, color = "Property Rates")) +
geom_point(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue")) +
geom_point(aes(y = Act_Licenses, color = "Licenses")) +
geom_point(aes(y = Act_Fees, color = "Act Fees")) +
labs(
title = "Land-Based Revenue Over Years",
x = "Year (2012 - 2022)",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
scale_color_brewer(palette = "Set1")+
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
The above shows the trends relationships.
# DACF vs Land-Based Revenues
model_DACF_land <- lm(DACF ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_Accra_MMDAs_Data)
summary(model_DACF_land)
##
## Call:
## lm(formula = DACF ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands +
## Act_Licenses + Act_Fees, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2722228 -858571 -146865 837599 3129947
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2898031.31386 357388.20274 8.109 0.0000000000512 ***
## Act_Permit -0.40377 0.21093 -1.914 0.0607 .
## Act_Property_Rates -0.07115 0.20076 -0.354 0.7244
## Act_Stool_Lands -0.59008 0.59361 -0.994 0.3245
## Act_Licenses 0.89466 0.45447 1.969 0.0540 .
## Act_Fees -0.70231 0.79449 -0.884 0.3805
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1336000 on 56 degrees of freedom
## (72 observations deleted due to missingness)
## Multiple R-squared: 0.1111, Adjusted R-squared: 0.0317
## F-statistic: 1.399 on 5 and 56 DF, p-value: 0.2386
cor_matrix_land_DACF <- cor(Cleaned_Accra_MMDAs_Data[, c("DACF", "Act_Permit", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_DACF)
## DACF Act_Permit Act_Property_Rates Act_Stool_Lands
## DACF 1.00000000 -0.1170296 0.01569253 -0.09241002
## Act_Permit -0.11702962 1.0000000 0.64515189 -0.17168996
## Act_Property_Rates 0.01569253 0.6451519 1.00000000 -0.03747740
## Act_Stool_Lands -0.09241002 -0.1716900 -0.03747740 1.00000000
## Act_Licenses 0.09624110 0.6062674 0.80675863 0.14351456
## Act_Fees -0.10157491 0.2431512 0.43440909 0.44820837
## Act_Licenses Act_Fees
## DACF 0.0962411 -0.1015749
## Act_Permit 0.6062674 0.2431512
## Act_Property_Rates 0.8067586 0.4344091
## Act_Stool_Lands 0.1435146 0.4482084
## Act_Licenses 1.0000000 0.5004822
## Act_Fees 0.5004822 1.0000000
corrplot(cor_matrix_land_DACF)
The multiple regression results of all the land-based revenues (permit fees, property rates, rents, stool lands revenue, licenses) and revenue (DACF) is not statistically significant ( p-value: 0.2386) with a R-squared of 0.1111 and Adjusted R-squared of 0.0317 means a poor model and does fit. In terms of individual terms none is significant as well.
The correlation matrix shows that DACF is very weakly correlated with all the land-based revenues.
# Simple linear Regression Analysis
model_permit <- lm(DACF ~ Act_Permit, data = Cleaned_Accra_MMDAs_Data)
model_property <- lm(DACF ~ Act_Property_Rates, data = Cleaned_Accra_MMDAs_Data)
model_stool <- lm(DACF ~ Act_Stool_Lands, data = Cleaned_Accra_MMDAs_Data)
model_license <- lm(DACF ~ Act_Licenses, data = Cleaned_Accra_MMDAs_Data)
model_acts <- lm(DACF ~ Act_Fees, data = Cleaned_Accra_MMDAs_Data)
# Scatter plots (DACF vs each land-based revenue)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Act_Permit, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Permit Fees", x = "Permit Fees", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_permit)
##
## Call:
## lm(formula = DACF ~ Act_Permit, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4017366 -1574022 -827996 393760 60526203
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2828145.0550 675326.4940 4.188 0.0000518 ***
## Act_Permit 0.2689 0.3163 0.850 0.397
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5594000 on 129 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.005574, Adjusted R-squared: -0.002135
## F-statistic: 0.723 on 1 and 129 DF, p-value: 0.3967
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Act_Property_Rates, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Property Rates", x = "Property Rates", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_property)
##
## Call:
## lm(formula = DACF ~ Act_Property_Rates, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6785980 -1510098 -716670 597761 59230132
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2499983.6757 569004.5840 4.394 0.0000227 ***
## Act_Property_Rates 0.7559 0.3593 2.104 0.0373 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5468000 on 132 degrees of freedom
## Multiple R-squared: 0.03244, Adjusted R-squared: 0.02511
## F-statistic: 4.426 on 1 and 132 DF, p-value: 0.0373
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Act_Stool_Lands, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_stool)
##
## Call:
## lm(formula = DACF ~ Act_Stool_Lands, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2775614 -995480 -264030 692709 3742800
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2789415.1518 181927.4050 15.333 <0.0000000000000002 ***
## Act_Stool_Lands -0.5348 0.4711 -1.135 0.261
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1384000 on 62 degrees of freedom
## (70 observations deleted due to missingness)
## Multiple R-squared: 0.02036, Adjusted R-squared: 0.00456
## F-statistic: 1.289 on 1 and 62 DF, p-value: 0.2607
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Act_Licenses, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Licenses", x = "Licenses", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_license)
##
## Call:
## lm(formula = DACF ~ Act_Licenses, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3574584 -1486464 -729379 335831 60285838
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2350959.7169 716449.4267 3.281 0.00132 **
## Act_Licenses 0.7961 0.5221 1.525 0.12971
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5511000 on 132 degrees of freedom
## Multiple R-squared: 0.01731, Adjusted R-squared: 0.009863
## F-statistic: 2.325 on 1 and 132 DF, p-value: 0.1297
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Act_Fees, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Act Fees", x = "Act Fees", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_acts)
##
## Call:
## lm(formula = DACF ~ Act_Fees, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3105732 -1629029 -735683 410937 60632357
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2755774.894 665023.830 4.144 0.0000606 ***
## Act_Fees 1.029 1.154 0.892 0.374
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5542000 on 132 degrees of freedom
## Multiple R-squared: 0.005988, Adjusted R-squared: -0.001542
## F-statistic: 0.7952 on 1 and 132 DF, p-value: 0.3742
From the simple linear regression analysis of the land-based revenues there is a statistically significant linear relationship between DACF and property rates the rest are not significant.
# Capital_Expenditure Trend
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Year, y = Capital_Expenditure)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
labs(
title = "Trends in Capital Expenditure (Ghana Cedis) Growth ",
x = "Year (2012-2022)",
y = "Capital Expenditure (Ghana Cedis)"
) +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
# Capital_Expenditure vs Land-Based Revenues
model_Capital_Expenditure_land <- lm(Capital_Expenditure ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_Accra_MMDAs_Data)
summary(model_Capital_Expenditure_land)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Permit + Act_Property_Rates +
## Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3344473 -1091134 -688466 676764 9546839
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 703499.0815 590439.7061 1.191 0.23849
## Act_Permit 0.3038 0.3485 0.872 0.38700
## Act_Property_Rates -0.6461 0.3317 -1.948 0.05641 .
## Act_Stool_Lands 0.3717 0.9807 0.379 0.70609
## Act_Licenses 2.1820 0.7508 2.906 0.00523 **
## Act_Fees 1.4277 1.3126 1.088 0.28137
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2207000 on 56 degrees of freedom
## (72 observations deleted due to missingness)
## Multiple R-squared: 0.282, Adjusted R-squared: 0.2179
## F-statistic: 4.398 on 5 and 56 DF, p-value: 0.001899
cor_matrix_land_Capital_Expenditure <- cor(Cleaned_Accra_MMDAs_Data[, c("Capital_Expenditure", "Act_Permit", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_Capital_Expenditure)
## Capital_Expenditure Act_Permit Act_Property_Rates
## Capital_Expenditure 1.0000000 0.2731381 0.2399146
## Act_Permit 0.2731381 1.0000000 0.6451519
## Act_Property_Rates 0.2399146 0.6451519 1.0000000
## Act_Stool_Lands 0.2046033 -0.1716900 -0.0374774
## Act_Licenses 0.4519570 0.6062674 0.8067586
## Act_Fees 0.3464088 0.2431512 0.4344091
## Act_Stool_Lands Act_Licenses Act_Fees
## Capital_Expenditure 0.2046033 0.4519570 0.3464088
## Act_Permit -0.1716900 0.6062674 0.2431512
## Act_Property_Rates -0.0374774 0.8067586 0.4344091
## Act_Stool_Lands 1.0000000 0.1435146 0.4482084
## Act_Licenses 0.1435146 1.0000000 0.5004822
## Act_Fees 0.4482084 0.5004822 1.0000000
corrplot(cor_matrix_land_Capital_Expenditure)
The multiple regression results of all the land-based revenues (permit fees, property rates, rents, stool lands revenue, licenses) and revenue (Capital_Expenditure) is statistically significant with p-value ( 0.001899), R-squared of 0.282 and Adjusted R-squared of 0.2179 . The individual term that is significant is licenses, the rest are not.
The correlation matrix shows that Capital_Expenditure shows is weakly correlated with all the land-based revenues.
# Simple linear Regression Analysis
model_permit <- lm(Capital_Expenditure ~ Act_Permit, data = Cleaned_Accra_MMDAs_Data)
model_property <- lm(Capital_Expenditure ~ Act_Property_Rates, data = Cleaned_Accra_MMDAs_Data)
model_stool <- lm(Capital_Expenditure ~ Act_Stool_Lands, data = Cleaned_Accra_MMDAs_Data)
model_license <- lm(Capital_Expenditure ~ Act_Licenses, data = Cleaned_Accra_MMDAs_Data)
model_acts <- lm(Capital_Expenditure ~ Act_Fees, data = Cleaned_Accra_MMDAs_Data)
# Scatter plots (Capital_Expenditure vs each land-based revenue)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Act_Permit, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Permit Fees", x = "Permit Fees", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_permit)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Permit, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2611200 -1322427 -541558 916515 11010998
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2009055.6272 249040.9697 8.067 0.000000000000546 ***
## Act_Permit 0.6757 0.1143 5.912 0.000000030982083 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1990000 on 123 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.2213, Adjusted R-squared: 0.215
## F-statistic: 34.96 on 1 and 123 DF, p-value: 0.00000003098
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Act_Property_Rates, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Property Rates", x = "Property Rates", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_property)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Property_Rates, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4026612 -1417018 -485935 998508 11934857
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2529003.8700 226761.4063 11.153 < 0.0000000000000002 ***
## Act_Property_Rates 0.5245 0.1402 3.741 0.000277 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2127000 on 126 degrees of freedom
## (6 observations deleted due to missingness)
## Multiple R-squared: 0.09996, Adjusted R-squared: 0.09282
## F-statistic: 13.99 on 1 and 126 DF, p-value: 0.0002772
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Act_Stool_Lands, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_stool)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Stool_Lands, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3725328 -1445211 -381290 921618 11594106
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2725662.0606 320967.5397 8.492 0.00000000000557 ***
## Act_Stool_Lands 1.1742 0.8312 1.413 0.163
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2442000 on 62 degrees of freedom
## (70 observations deleted due to missingness)
## Multiple R-squared: 0.03118, Adjusted R-squared: 0.01556
## F-statistic: 1.996 on 1 and 62 DF, p-value: 0.1628
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Act_Licenses, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Licenses", x = "Licenses", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_license)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Licenses, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5600151 -1422028 -385936 1092819 11316701
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2092375.6834 278313.0737 7.518 0.00000000000909 ***
## Act_Licenses 0.8822 0.2017 4.375 0.00002523641127 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2089000 on 126 degrees of freedom
## (6 observations deleted due to missingness)
## Multiple R-squared: 0.1319, Adjusted R-squared: 0.125
## F-statistic: 19.14 on 1 and 126 DF, p-value: 0.00002524
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Act_Fees, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Act Fees", x = "Act Fees", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_acts)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Fees, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3404125 -1588077 -416638 998079 11351413
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2392218.8001 264700.7492 9.037 0.00000000000000238 ***
## Act_Fees 1.4940 0.4504 3.317 0.00119 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2150000 on 126 degrees of freedom
## (6 observations deleted due to missingness)
## Multiple R-squared: 0.08031, Adjusted R-squared: 0.07301
## F-statistic: 11 on 1 and 126 DF, p-value: 0.001189
The simple linear regression analysis of the land-based revenues found capital expenditure to be statistically significant with all land-based revenues except stool lands.
-Recurrent is NA
# Population vs Land-Based Revenues
model_Population_land <- lm(Population ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_Accra_MMDAs_Data)
summary(model_Population_land)
##
## Call:
## lm(formula = Population ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands +
## Act_Licenses + Act_Fees, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -157751 -36144 -22580 28020 191111
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 94526.62403 21940.66157 4.308 0.0000671 ***
## Act_Permit 0.03858 0.01295 2.980 0.00426 **
## Act_Property_Rates -0.01480 0.01232 -1.201 0.23489
## Act_Stool_Lands -0.01371 0.03644 -0.376 0.70825
## Act_Licenses 0.04688 0.02790 1.680 0.09845 .
## Act_Fees 0.02861 0.04877 0.587 0.55987
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 82020 on 56 degrees of freedom
## (72 observations deleted due to missingness)
## Multiple R-squared: 0.3278, Adjusted R-squared: 0.2677
## F-statistic: 5.461 on 5 and 56 DF, p-value: 0.0003669
cor_matrix_land_Population <- cor(Cleaned_Accra_MMDAs_Data[, c("Population", "Act_Permit", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_Population)
## Population Act_Permit Act_Property_Rates Act_Stool_Lands
## Population 1.00000000 0.5303373 0.3614209 -0.03193372
## Act_Permit 0.53033728 1.0000000 0.6451519 -0.17168996
## Act_Property_Rates 0.36142092 0.6451519 1.0000000 -0.03747740
## Act_Stool_Lands -0.03193372 -0.1716900 -0.0374774 1.00000000
## Act_Licenses 0.45353532 0.6062674 0.8067586 0.14351456
## Act_Fees 0.23654875 0.2431512 0.4344091 0.44820837
## Act_Licenses Act_Fees
## Population 0.4535353 0.2365488
## Act_Permit 0.6062674 0.2431512
## Act_Property_Rates 0.8067586 0.4344091
## Act_Stool_Lands 0.1435146 0.4482084
## Act_Licenses 1.0000000 0.5004822
## Act_Fees 0.5004822 1.0000000
corrplot(cor_matrix_land_Population)
The multiple regression results of all the land-based revenues (permit fees, property rates, rents, stool lands revenue, act fees, licenses) and Population overall F-statistic = 5.461 and p-value = 0.0003669 is statistically significant with R-squared of 0.3278,, and Adjusted R-squared of 0.2677. The individual term that is statistically significant is permit fees.
The correlation matrix shows that Population is weakly correlated with all the land-based revenues.
# Simple linear Regression Analysis
model_permit <- lm(Population ~ Act_Permit, data = Cleaned_Accra_MMDAs_Data)
model_property <- lm(Population ~ Act_Property_Rates, data = Cleaned_Accra_MMDAs_Data)
model_stool <- lm(Population ~ Act_Stool_Lands, data = Cleaned_Accra_MMDAs_Data)
model_license <- lm(Population ~ Act_Licenses, data = Cleaned_Accra_MMDAs_Data)
model_acts <- lm(Population ~ Act_Fees, data = Cleaned_Accra_MMDAs_Data)
# Scatter plots (Population vs each land-based revenue)
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Act_Permit, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Permit Fees", x = "Permit Fees", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_permit)
##
## Call:
## lm(formula = Population ~ Act_Permit, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -182569 -66425 -4712 42154 207304
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 142041.945663 9788.443554 14.511 < 0.0000000000000002 ***
## Act_Permit 0.019378 0.004576 4.235 0.0000433 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 80930 on 128 degrees of freedom
## (4 observations deleted due to missingness)
## Multiple R-squared: 0.1229, Adjusted R-squared: 0.1161
## F-statistic: 17.94 on 1 and 128 DF, p-value: 0.00004329
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Act_Property_Rates, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Property Rates", x = "Property Rates", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_property)
##
## Call:
## lm(formula = Population ~ Act_Property_Rates, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -156283 -75816 -16231 53945 193530
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 154346.085746 8578.938943 17.99 < 0.0000000000000002 ***
## Act_Property_Rates 0.017555 0.005435 3.23 0.00157 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 82410 on 131 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.07376, Adjusted R-squared: 0.06669
## F-statistic: 10.43 on 1 and 131 DF, p-value: 0.001566
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Act_Stool_Lands, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_stool)
##
## Call:
## lm(formula = Population ~ Act_Stool_Lands, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -103278 -61570 -31856 31203 248589
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 176928.78516 12526.28636 14.12 <0.0000000000000002 ***
## Act_Stool_Lands -0.01202 0.03244 -0.37 0.712
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 95320 on 62 degrees of freedom
## (70 observations deleted due to missingness)
## Multiple R-squared: 0.002209, Adjusted R-squared: -0.01388
## F-statistic: 0.1372 on 1 and 62 DF, p-value: 0.7123
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Act_Licenses, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Licenses", x = "Licenses", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_license)
##
## Call:
## lm(formula = Population ~ Act_Licenses, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -166417 -71357 -14981 44953 208540
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 149078.156520 10903.483199 13.673 <0.0000000000000002 ***
## Act_Licenses 0.020366 0.008052 2.529 0.0126 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 83620 on 131 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.04657, Adjusted R-squared: 0.03929
## F-statistic: 6.398 on 1 and 131 DF, p-value: 0.01261
ggplot(Cleaned_Accra_MMDAs_Data, aes(x = Act_Fees, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Act Fees", x = "Act Fees", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_acts)
##
## Call:
## lm(formula = Population ~ Act_Fees, data = Cleaned_Accra_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -119922 -72697 -15951 39918 226064
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 148718.3913 10010.5168 14.856 < 0.0000000000000002 ***
## Act_Fees 0.0520 0.0173 3.005 0.00318 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 82830 on 131 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.0645, Adjusted R-squared: 0.05736
## F-statistic: 9.032 on 1 and 131 DF, p-value: 0.00318
The simple linear regression analysis of the land-based revenues show all. the models are statistically significant except stool lands.
# no variables
# no variables